home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Skunkware 5
/
Skunkware 5.iso
/
man
/
cat.1
/
waisindex.1
< prev
next >
Wrap
Text File
|
1995-07-27
|
9KB
|
199 lines
WWWWAAAAIIIISSSSIIIINNNNDDDDEEEEXXXX((((1111)))) TTTThhhhiiiinnnnkkkkiiiinnnngggg MMMMaaaacccchhhhiiiinnnneeeessss ((((SSSSuuuunnnn MMMMaaaayyyy 11110000 1111999999992222)))) WWWWAAAAIIIISSSSIIIINNNNDDDDEEEEXXXX((((1111))))
NNNNAAAAMMMMEEEE
waisindex - Indexes files
SSSSYYYYNNNNOOOOPPPPSSSSIIIISSSS
wwwwaaaaiiiissssiiiinnnnddddeeeexxxx [ -d index_filename ] [ -a ] [ -r ]
[ -mem mbytes ] [ -register ] [ -export ] [ -e [ file ] ]
[ -l log_level ] [ -pos | -nopos ] [ -nopairs | -pairs ]
[ -nocat ] [ -T type ] [ -t type ] [ -contents |
-nocontents ] filename filename ...
DDDDEEEESSSSCCCCRRRRIIIIPPPPTTTTIIIIOOOONNNN
wwwwaaaaiiiissssiiiinnnnddddeeeexxxx creates an index of the words in files so that
they can be searched quickly (see waissearch). The index
takes about as much disk space as the original text. It
also creates a new source structure named index_filename.src
if none exists.
OOOOPPPPTTTTIIIIOOOONNNNSSSS
----dddd _i_n_d_e_x__f_i_l_e_n_a_m_e
This is the base filename for the index files.
Therefore if /usr/local/foo is specified, then the
index files will be called /usr/local/foo.dct etc.
The index should be stored on the local file
system of the machine running waisindex. It works
over NFS, but it is much slower.
----aaaa Append this index to an existing one. Useful for
incremental additions or updates. This will only
add onto an index, so that if a file has changed,
it will get reindexed, but the old entries will
not be purged. Therefore, to save space, it is a
good idea to reindex the whole set of files
periodically.
----rrrr Recursively index subdirectories.
-mmmmeeeemmmm How much main memory to use during indexing. This
variable will have a large effect on how fast
indexing is done.
----rrrreeeeggggiiiisssstttteeeerrrr Register this database with the directory of
servers. You are encouraged to register
databases, but only ones that will be consistently
running. The directory of servers is available to
anyone that is on the internet or can phone in.
----eeeexxxxppppoooorrrrtttt This causes the resulting source description file
to include the host-name and tcp-port for use by
the clients. Otherwise the file contains no
connection information, and is expected to be used
only for local searches.
Page 1 (printed 7/27/95)
WWWWAAAAIIIISSSSIIIINNNNDDDDEEEEXXXX((((1111)))) TTTThhhhiiiinnnnkkkkiiiinnnngggg MMMMaaaacccchhhhiiiinnnneeeessss ((((SSSSuuuunnnn MMMMaaaayyyy 11110000 1111999999992222)))) WWWWAAAAIIIISSSSIIIINNNNDDDDEEEEXXXX((((1111))))
----eeee [ _f_i_l_e_n_a_m_e ]
Redirect error output to pathname, if supplied, or
to /dev/null. Error output defaults to stderr,
unless -s is selected, in which case it defaults
to /dev/null.
----llll _l_o_g__l_e_v_e_l
set logging level. Currently only levels 0, 1, 5
and 10 are meaningful: Level 0 means log nothing
(silent). Level 1 logs only errors and warnings
(messages of HIGH priority), level 5 logs messages
of MEDIUM priority (like indexing filename info).
Level 10 logs everything.
----ppppoooossss ((((----nnnnooooppppoooossss))))
Include (don't include - default) word position
information in the index. This will increase the
index size, but will allow search engines to do
proximity.
----nnnnooooppppaaaaiiiirrrrssss ((((----ppppaaaaiiiirrrrssss))))
Don't build (build - the default) word pairs from
consecutive capitalized words.
----nnnnooooccccaaaatttt Inhibits the creation of a catalog. This is
useful for databases with a large number of
documents, as the catalog contains 3 lines per
document.
----ccccoooonnnntttteeeennnnttttssss ((((----nnnnooooccccoooonnnntttteeeennnnttttssss))))
Include (exclude) the contents of the file from
the index. The filename and header will still be
indexed. Default is type depedant.
----TTTT ttttyyyyppppeeee Sets the TYPE of the document to "type".
----tttt _t_y_p_e This is the format of files that are handled by
waisindex. It is easy to parse a different
format, but that has to be done by changing the
source (ircfiles.c). To find out the list of
currently known types, execute the waisindex
command with no arguments and it will list them.
ffffiiiilllleeeennnnaaaammmmeeee ffffiiiilllleeeennnnaaaammmmeeee............
These are the files that will be indexed according
to the arguments above. To insure the files are
registered in the filename table correctly, it is
advised that these be full paths (beginning with a
/). If the database is to be used from a machine
other than the machine on which the index is
created, this should be a machine-independant
path.
Page 2 (printed 7/27/95)
WWWWAAAAIIIISSSSIIIINNNNDDDDEEEEXXXX((((1111)))) TTTThhhhiiiinnnnkkkkiiiinnnngggg MMMMaaaacccchhhhiiiinnnneeeessss ((((SSSSuuuunnnn MMMMaaaayyyy 11110000 1111999999992222)))) WWWWAAAAIIIISSSSIIIINNNNDDDDEEEEXXXX((((1111))))
SSSSEEEEEEEE AAAALLLLSSSSOOOO
waissearch(1), waisserver(1), waissearch-gmacs(1), xwais(1),
xwaisq(1)
Wide Area Information Servers Concepts by Brewster Kahle.
Brewster@think.com
DDDDIIIIAAAAGGGGNNNNOOOOSSSSTTTTIIIICCCCSSSS
The diagnostics produced by the waisindex are meant to be
self-explanatory.
BBBBUUUUGGGGSSSS
It temporarily takes twice the space it needs for an index.
Due to some compile time constants the document table is
limited to 16 Megabytes. This limits the indexer to
databases with headlines that add up to less than 16
megabytes (since thats the principal component of the
table). This is typically a problem for database types
where a record is essentially a headline (one_line, archie).
See the note in ir/README in the wais distribution for more
detail.
Page 3 (printed 7/27/95)